[TensorRT EP] Fix InferenceSession::Run() not thread-safe issue #19301

chilo-ms · 2024-01-29T02:20:01Z

Given that InferenceSession::Run() is guaranteed to be thread-safe meaning multiple threads can call this function concurrently,
TRT EP needs to carefully take care of concurrency here, if not, following concurrent issue might happen:

It's suggested that to perform inference concurrently in multiple streams, use one trt execution context per stream.
In the design of TRT EP (Not apply per-thread context implementation) and if multiple threads are calling InferenceSession::Run() concurrently, the trt execution context instance is shared by all the threads and each thread aquires different stream from ORT.
So TRT EP will end up having one trt execution context using multiple streams which is not suggested.
But, since the whole compute_func() is protected by the lock and if cudaStreamSynchronize() is enforced here, one trt execution context per stream is guaranteed.

Therefore, TRT EP needs to call cudaStreamSynchronize() at compute_func() which means to wait until stream has completed all operations to prevent the concurrent issue mentioned above

github isse: #19275

onnxruntime/core/providers/tensorrt/tensorrt_execution_provider.cc

Given that InferenceSession::Run() is guaranteed to be thread-safe meaning multiple threads can call this function concurrently, TRT EP needs to carefully take care of concurrency here, if not, following concurrent issue might happen: - It's suggested that to perform inference concurrently in multiple streams, use one trt execution context per stream. In the design of TRT EP (Not apply per-thread context implementation) and if multiple threads are calling InferenceSession::Run() concurrently, the trt execution context instance is shared by all the threads and each thread aquires different stream from ORT. So TRT EP will end up having one trt execution context using multiple streams which is not suggested. But, since the whole compute_func() is protected by the lock and if cudaStreamSynchronize() is enforced here, one trt execution context per stream is guaranteed. Therefore, TRT EP needs to call cudaStreamSynchronize() at compute_func() which means to wait until stream has completed all operations to prevent the concurrent github isse: #19275

…ue (#19301)" This reverts commit 00d0481.

chilo-ms added 3 commits January 28, 2024 18:12

update

dd2b915

modify comment

7b69725

lintrunner -a

5ec9e39

chilo-ms requested a review from jywu-msft January 29, 2024 02:39

chilo-ms added 4 commits January 29, 2024 05:50

skip cudaStreamSynchronize when cuda graph is enabled

1f06871

move sync at the end of inference run

4a950ff

update

413d506

put sync after enqueueV3 and make it configurable

c3ff278

chilo-ms marked this pull request as ready for review January 29, 2024 18:39

chilo-ms added the release:1.17.0 label Jan 29, 2024

skip cudaStreamSynchronize when cuda graph is enabled

b4926eb

jywu-msft reviewed Jan 29, 2024

View reviewed changes

onnxruntime/core/providers/tensorrt/tensorrt_execution_provider.cc Outdated Show resolved Hide resolved

refactor

f7d3a5a

jywu-msft previously approved these changes Jan 29, 2024

View reviewed changes

external stream won't have to sync explicitly

d29ef3d

chilo-ms dismissed jywu-msft’s stale review via d29ef3d January 29, 2024 23:18

jywu-msft approved these changes Jan 30, 2024

View reviewed changes

chilo-ms merged commit 00d0481 into main Jan 30, 2024
95 of 97 checks passed

chilo-ms deleted the chi/trt_ep_concurrent_fix branch January 30, 2024 01:36

chilo-ms mentioned this pull request Jan 30, 2024

tensorRT runtime provider is not thread-safe #19275

Closed

chilo-ms mentioned this pull request Feb 1, 2024

[Performance] The CUDA Stream cannot be set through Python API #19094

Open

yf711 added a commit that referenced this pull request Mar 15, 2024

Revert "[TensorRT EP] Fix InferenceSession::Run() not thread-safe iss…

24d58d6

…ue (#19301)" This reverts commit 00d0481.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TensorRT EP] Fix InferenceSession::Run() not thread-safe issue #19301

[TensorRT EP] Fix InferenceSession::Run() not thread-safe issue #19301

chilo-ms commented Jan 29, 2024 •

edited

Loading

[TensorRT EP] Fix InferenceSession::Run() not thread-safe issue #19301

[TensorRT EP] Fix InferenceSession::Run() not thread-safe issue #19301

Conversation

chilo-ms commented Jan 29, 2024 • edited Loading

chilo-ms commented Jan 29, 2024 •

edited

Loading